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Abstract In the study of long-time correlations extremely long orbits must be cal- 
culated. This may be accomplished much more reliably using fixed-point arithmetic. 
Use of this arithmetic on the Cray-1 computer is illustrated. 



Introduction 

There has been considerable interest recently in simple dynamical systems which 
exhibit very complicated behavior. One of the simplest such systems is a two- 
dimensional area-preserving map. This shares many of the properties of Hamiltonian 
systems with two degrees of freedom. The phase space for these systems is typically 
divided in integrable and nonintegrable (or stochastic) portions. An interesting prob- 
lem is the behavior of an orbit in the nonintegrable portion of phase space when it 
approaches an integrable portion. This can introduce very long-time correlations into 
the stochastic orbits. The first systematic study of this problem was given by Chan- 
non and Lebowitz, 1 who studied orbits in the Henon map. 2 This study was based on 
7750 orbits of length 10 4 . However, subsequent studies have recognized that longer 
orbits must be considered in order to determine the long-time behavior accurately. 
For example, in their work on the whisker mapping Chirikov and Shepelyansky 3 stud- 
ied a single orbit of length 10 8 . More ambitious calculations were performed by the 
author 4 on the periodic quadratic mapping 

Q ■ y'-y = g(x), x'-x = y', (1) 

where 

g(q.\ / ^("^ ^0' -^min X < ^maxi 

\g(x±L), otherwise, 
and L = x max — £ min > 0. In this study 1600 orbits of length 2 x 10 9 were used (i.e., 
a total of 3.2 x 10 12 iterations). 



*This work was supported by the United States Department of Energy under Contract 
DE-AC02-76-CHO-3073. 
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The Numerical Problem 

In a calculation of this magnitude, we must ask whether the results obtained on a 
computer have any relevance to the study of the exact system. The problem arises 
because only a finite set of numbers can be represented on a computer. Consequently, 
every orbit in a numerical mapping is periodic. The numerical mapping does not 
give the generic behavior for an exact mapping in which periodic orbits are a set of 
measure zero. Nevertheless, useful information can be obtained if the average period 
T of orbits in the numerical mapping exceeds the time in which we are interested. 
On the other hand, the numerical calculations are useless if T is less than the time 
in which we are interested. 

For a two-dimensional map, such as Eq. (1) implemented in floating-point arith- 
metic, we estimate that T ~ 2 p / 2 where p is the number of bits of precision. On the 
Cray— 1, we have p = 48 and T ~ 10 7 . Therefore, single-precision floating-point 
arithmetic on the Cray-1 cannot be used to study orbits of length 10 9 . One possible 
solution (the brute-force approach) is to go to double precision. This extends T to 
about 10 14 but at a cost of a factor of 2-4 in speed. It is preferable, however, to 
understand what defects in the floating-point number system cause T to be as short 
as it is, and then to employ a system of arithmetic which doesn't have such defects. 

We are approximating a mapping Q (the "exact" mapping) with another map- 
ping Q* (the realization of Q on a computer, the "numerical" mapping). We can 
estimate the error in a single iteration of the mapping as 2~ p . As we iterate the map- 
ping, the error grows. However, we can seek to control the error by making sure that 
Q* has some of the same properties as Q. Indeed if Q* has enough of the "interesting" 
properties of Q, we might tolerate quite a large error in a single iteration. 

(A parallel situation exists in the approximation of the collision operator in a 
plasma by the Landau collision operator. This is known to be in error by about 5%. 
However, because the Landau operator conserves all the quantities conserved by the 
exact collision operator — number, momentum, and energy — and because it has an 
H theorem, we are sure that the errors won't affect anything "important." Indeed, 
for these reasons, most plasma physicists are happy to regard the Landau collision 
operator as exact!) 

Perhaps the most important property of Q is that it is area-preserving. It 
is known that a small amount of dissipation greatly alters a mapping. Now Q* is a 
mapping defined on a discrete set of numbers so that the area-preserving property has 
to be translated into the analogous property for discrete mappings. However, because 
the floating-point number system is nonuniform, this property is very complicated and 
consequently difficult to implement. On the other hand, if a uniform number system 
is used, then area-preservation in Q corresponds to the mapping Q* being one-to- 
one. Such a number system is implemented on most computers and is called the 
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"fixed-point" number system. 
Fixed-Point Numbers 

Floating-point numbers are conventionally represented as 2 m x / where | < / < 1 
and m can vary (the binary point can float). The fraction / and the exponent m are 
then stored in different parts of the computer word. Floating-point numbers are ideal 
for representing numbers which may be very small or very large. In typical mapping 
calculations, this flexibility is not needed. The numbers representing the coordinates 
are usually bounded, so we do not need to be able to represent very large numbers. 
Furthermore, the added precision available for small floating-point numbers is wasted 
since these are often added to much larger numbers. 

Fixed-point numbers are also represented as 2 m x /, but now < / < 1 and 
m is fixed. (Note a possible confusion in the terminology: fixed-point numbers have 
nothing to do with the fixed-points of a mapping.) Now only / need be stored in the 
computer word, and it is the programmer's responsibility to remember m. (Because 
a whole word is used to store /, it is often possible to increase the precision. Thus 
on a pdp-10, 36 bits are available for /, which is considerably more than the 27 bits 
used to represent the fraction in floating-point numbers on that machine.) 

Addition and subtraction of fixed-point numbers are exact. Thus, if Eq. (1) is 
implemented in fixed-point arithmetic to give a numerical mapping Q* , then this can 
be represented by Eq. (1) with the exact g(x) replaced by an approximation, g*(x). 
Furthermore, Q*~ l exists because the operations in Q* can be reversed to give x and 
y in terms of x' and y' . In fact, there is a way to compute Q*~ l numerically without 
having to program the reverse operations. If we define an involution (J 2 = identity) 

J ■ y' = -y, x' = x-y, 

then we can show that Q*~ n = J~ 1 Q* n J, which may be exactly computed because 
J can be exactly carried out in fixed-point arithmetic. Thus each point on the plane 
has a unique successor (given by Q*) and a unique predecessor (given by Q*~ r ) and 
the mapping is one-to-one. Such a fixed-point mapping is a permutation on phase 
space. In contrast, a floating-point mapping is a many-to-one mapping, or a function 
on phase space. 

One-to-one mappings have been used in the study of dynamical systems by 
Miller and Prendergast 5 and by Rannou. 6 These authors implemented the mappings 
with integer arithmetic and used rather coarse representations of phase space. Thus 
Rannou divides phase space into a maximum of 800 x 800 cells. (A fixed-point 
mapping implemented on the Cray-1 has 2 48 x 2 48 cells.) However, these studies are 
useful for providing theoretical results about permutations. In particular, Rannou 6 
considers random permutations of N points defined as the ensemble of all possible 
such permutations. She shows that the average period of orbits is \{N + 1). The 
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average period of orbits in a random function is given by Knuth 7 as yViV/8 + ± . The 
reason that permutations have longer orbits is that they practice collision-avoidance. 
The only way such an orbit can become periodic is by landing on the initial point. 
In a random function, an orbit can become periodic by landing on any of its previous 
points. 

Now the mappings describing dynamical systems are definitely not random. In 
particular, they often possess symmetries (for example, the symmetry defined by the 
involution J connecting forwards and backwards trajectories). Rannou finds 6 that 
the average length of the orbits of a random symmetric permutation is reduced to 
0(N 1 ^ 2 ). We conjecture that a similar phenomenon occurs with symmetric functions 
reducing the average length of the orbits to 0{N 1 ^). Substituting N = (2 48 ) 2 , which 
is appropriate for a two-dimensional mapping on the Cray-1, we find that T ~ 10 14 
with fixed-point arithmetic and T ~ 10 7 with floating-point arithmetic. Clearly, 
fixed-point arithmetic allows us to examine much longer orbits. 

Operations on Floating-Point Numbers 

Let us briefly describe how to perform some useful operations on fixed-point numbers. 
We begin with the elementary operations: Addition and subtraction are performed 
with the same instructions as for integer addition and subtraction. Multiplication 
by a power of two 2 ±m x may be accomplished by a left/right shift of x by m bits 
(possibly with sign-extension). The computation of the integer part, int(x) = 
and the fraction part, fract(x) = x— \x\ , of a number may be performed by ANDing the 
computer word with appropriate masks. (The computation of int(x) for floating-point 
numbers needs a sequence of instructions in FORTRAN: xi = aint(x); if (xi.gt.x) then 
xi = xi — 1.0.) 

The method for multiplying by an arbitrary number depends on the computer. 
For instance, on a pdp-10 the mul instruction multiplies two 36-bit numbers to give 
the 72-bit product. This result can be shifted to align the binary-point with the 
assumed position within the word. Similarly, on the MC68000 microprocessor the 
MULS instruction gives the 32-bit product of two 16-bit quantities. (Higher precision 
multiplication can be performed with a sequence of these instructions.) The situation 
is slightly different on the Cray-1. When two numbers with zero exponent fields are 
multiplied using the floating-multiply instructions, the product of the fraction fields is 
returned with no normalization performed. Thus, if we represent fixed-point numbers 
as a Cray-1 word with the binary point 48 places from the right, then we can multiply 
two numbers in [0, 1) with the rounded floating-multiply instruction *r. 

Special functions may be calculated by converting the fixed-point number to 
floating-point, calling a library routine for the special function, and converting the 
result back to fixed-point. Alternatively, a subroutine calculating the special func- 
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tion directly with fixed-point arithmetic may be written. The program metafont by 
Knuth 8 contains a complete collection of routines for the elementary functions assum- 
ing a fraction size of 16 bits. An interesting shortcut is available for the calculation of 
random numbers. Usually, there is a library routine which returns a random floating- 
point number uniformly distributed in [0, 1). The fraction field of this floating-point 
number is uniformly distributed in thus a uniformly distributed fixed-point 

number may be obtained by taking all but the first bit of the fraction. 

Examples of Mappings 

Care must be taken when implementing an area-preserving mapping in fixed-point 
arithmetic to ensure that the numerical mapping is one-to-one. We have seen that 
the mapping Q may be implemented in a straightforward way. What about other 
mappings? The problem is well illustrated by the mapping x' = 2x, y' = \y. If this 
is coded as it stands, then on each iteration the least significant bit of y is lost and 
the mapping is clearly not invertible. Some way is therefore needed for remembering 
that lost bit. 

We begin by observing that a succession of nonlinear shifts 

x' = X + f(y), y' = y + g{x') 
is in a form that is invertible. The trick is to combine mappings of this form to 
produce the desired mapping. Thus the scaling mapping 

x' = sx, y' = y/s 
may be implemented as 

x* = x-y/s, y* = y + (s - l)x* , x' = x* + y* , y' = y* - (s - \)x'/s. 
Similarly, the rotation 

x' = cos 8 x — sin Oy, y' = sin 8 x + cos 9 y 
can be coded as 

x* = x — ay, y' = y + (3x* , x' = x* — ay' , 
where a = (1 — cos 8)/ sin 8 and f3 = sin#. Note that a diverges for 8 — > ir. But 
this is no serious limitation because rotations by multiples of ^tt can be realized 
exactly merely with sign changes; thus we can restrict \9\ < jir. Interestingly, this 
form for the rotation involves only three multiplications, and so may be faster than 
the "standard" form which involves four. If 8 is small, then the rotation can be 
approximated by 9 

x' = x-ey, y' = y + ex', 
where e = 2sin|#. In fact, with exact arithmetic this produces a slightly eccentric 
ellipse x 2 — exy + y 2 = const. 
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Realization on a Cray-1 

The biggest disincentive to using fixed-point arithmetic is the absence of any support 
for this arithmetic in languages like FORTRAN. Usually, one has to resort to assembly 
language to utilize this arithmetic. However, it is possible to make a substantial gain 
in speed by coding in assembly language, so the effort is often worth it. (Mapping 
calculations tend to benefit from hand-coding in assembly language because most of 
the running time is spent in a small mapping subroutine.) In order to illustrate the 
benefits, we show how the mapping Q may be written in assembly language CAL on 
the Cray-1. The reader is referred to the CAL manual 10 for further details on the 
language, and to the CFT manual 11 (Appendix F) for details on how to interface 
assembly language routines to a FORTRAN program. 

Fixed-point numbers are represented as a 64-bit word in twos-complement no- 
tation with the binary point 48 places from the left. The floating-point multiply 
instruction *r can multiply two such fixed-point numbers provided they lie in the 
range [0,1). It is convenient to scale the variables in Eq. (1) so that the mapping 
is periodic in x and y with period 1. The running time is slightly shortened if the 
inverse of the mapping is used. The mapping then becomes 4 

Q*' 1 : x' = x-y+\ (modi), y' = y - 2 m (ax' 2 - bx' + c) (modi), 

where a, b, c are all in [0, 1) and m is non-negative. In addition, we wish to keep 
track of when x or y leave the unit square. The speed can be increased by following 
64 particles at once (to make use of the vectorization capabilities of the Cray-1) and 
by iterating the mapping many times in one call. Thus a FORTRAN realization of this 
procedure (using floating-point arithmetic) reads 

subroutine quade(n, x, y, Iv, a, b, c, m) 
parameter (I = 64) 
logical Iv 

dimension x(l), y(l),xi(l),yi(l),lv(l) 
do 1 i = 1,1 
lv(i) — .false. 

1 continue 

do 2 j = 1, n 
do 2 i = 1,1 

x(i) — x(i) — y(i) + 0.5 
xi(i) = aint {x(i)) 

xi(i) = cvmgm ixiii) — 1.0, xiii), x(i)) 
x(i) = x(i) — xi(i) 

y(i) — y(i) — 2.0 ** m * (a * x(i) ** 2 — b * x(i) + c) 
yi(i) = aint (y(i)) 

yi(i) = cvmgm (yi(i) - 1.0, yi{i),y{i)) 
y{i) = y(i) - yi(i) 

lv(i) = lv{i) .or. (xi(i) .ne. 0.0) .or. (yi(i) .ne. 0.0) 

2 continue 
return 
end 
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This iterates n times, and returns x and y. (The CFT function cvmgm(a, b, c) 

returns a if c < and b otherwise.) The variable Iv returns true if the particle left 
the unit square during any of these n iterations. 

Let us see how this can be coded using fixed-point arithmetic in CAL. For 
brevity, the loading and storing of the arguments is omitted. At this point we have 
stored the vector length (64) in the register vl, the vectors x and y in the vector 
registers v2 and vl, the constants a, b, and c in s2, s3, and s4, the count n in al , 
and the shift min a^. We begin with some initialization: 



vO 





vO <- 


- (used for ORing with x 


si 


1 






si 


si <47 


0.5 




s6 


-si 


s6 <- 


- -0.5 


s5 


<48 


s5 <- 


- fraction mask 


al 


-al 


al <- 


n (flip sign of count) 



and y) 



We next turn to the main loop. Only the j loop in the FORTRAN code needs to be 
explicitly written. The i loop is implicitly performed by the vector instructions. The 
calculation is carried out entirely in registers. However, at the end of one iteration, 
x and y are in different registers (v7 and v4). It is therefore necessary to repeat the 
coding with interchanged registers to get x and y back to v2 and vl . The test for 
the particle having left the unit square Iv is given by ORing the successive x's and 
y's together into register vO. At the end we check whether the integer part of vO is 
nonzero. We are able to postpone the operation y <— fract(y) until the end. (With 
floating-point arithmetic, this would result in a loss of precision, so it is necessary to 
extract the fractional part with each iteration.) 



v3 


s6 + vl 


y-0.5 


v5 


v2 - v3 


x <— x — y + 0.5 


vl 


s5 & v5 


x <— fract(x) 


v6 


s2 *r vl 


ax 


v2 


vO ! v5 


Iv <— Iv OR int(x) / 


v3 


v6 *r vl 


ax 2 


v4 


s4 + v3 


ax 2 + c 


s0 


s3 *r s4 


use multiply unit for 1 cycle 


vO 


s3 *r vl 


bx 


v6 


v4 — vO 


(ax 2 — bx + c) 


v5 


v6 < a4 


2 m (ax 2 -bx + c) 


v4 


vl — v,5 


y <- y - 2 m (ax 2 - bx + c) 


vO 


v2 ! v4 


Iv <— Iv OR int(y) / 


al 


al + 1 


increment loop counter 


v3 


s6 + v4 


now repeat everything with 


v5 


v7-v3 


vl ^ v4 and v2 ^ vl 


al 


al + 1 




aO 


al 




jam 


loop 


jump back if more to do 



We end by taking the fractional part of y and by testing vO for a nonzero integer 
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part. 



v3 


s5 & vl 


y <- fract(y) 


a3 


48 


fraction shift 


v6 


vO > a3 


shift out fraction part of vO 


vm 


v6, n 


Iv <— int(y) ^ OR int(x) / 


si 


vm 


transfer to si 



At this point x and y are in v2 and u3, and Iv is in si . 

There are two sources of speed on the Cray— 1. The first is the ability to 
process vectors. This enables a given functional unit to produce one result every 
cycle (12.5ns). Utilizing this feature in CAL is relatively easy. The second source of 
speed is the ability of different functional units to be operating at the same time. 
Depending on the degree of overlap, this can speed up the program by a factor of 
1.5-3. However, taking advantage of this feature is made difficult by a complicated 
set of rules for when a particular instruction can issue. An extremely useful tool is 
the timing code CYCLES 12 which produces a detailed timing analysis of the code. The 
application of this code to the main loop of the mapping routine gives: 







W 


D 


I 


C 


O 


F 


R 


v3 


s6 + vl 









5 


64 


68 


69 


v5 


v2 - v3 


68 


25 


69 


74 


133 


137 


138 


vl 


s5 & v5 


4 


10 


74 


78 


138 


142 


142 


v6 


s2 *r vl 


3 


10 


78 


87 


142 


146 


151 


v2 


vO ! v5 


63 


01 


142 


146 


206 


210 


210 


v3 


v6 *r vl 


8 


25 


151 


160 


215 


219 


224 


v4 


s4 + v3 


8 


10 


160 


165 


224 


228 


229 


sO 


s3 *r s4 


58 


01 


219 


226 








vO 


s3 *r vl 






220 


229 


284 


288 


293 


v6 


v4 — vO 


8 


10 


229 


234 


293 


297 


298 


v5 


v6 < a4 


4 


10 


234 


240 


298 


302 


304 


v4 


vl — v5 


69 


21 


304 


309 


368 


372 


373 


vO 


v2 ! v4 


4 


10 


309 


313 


373 


377 


377 


al 


al + 1 






310 


312 








v3 


s6 + v4 


62 


05 


373 


378 


437 


441 


442 



For each instruction is given the wait time, the delay code, the issue time, the chain 
slot time, the operand ready time, the functional unit ready time, and the result 
ready time. The times are all in units of the clock of the Cray-1 namely 12.5 ns. The 
delay code is an octal number describing why the instruction could not issue. The 
meanings of the bits in this code are 

1 functional unit not ready 

2 result register not ready 

4 operand register not ready 
10 waiting for chain slot 
20 missed chain slot 

From the last line in the timing analysis, we see that one complete iteration (for 64 
particles) takes 373 cycles or 73ns/particle. Since there are 12 vector instructions in 
one iteration, the machine is computing results at the rate of about 2/cycle. Very 
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little improvement is possible beyond this because the vector add unit is busy 91% 
of the time. For comparison, if the same mapping is implemented with floating-point 
in FORTRAN, it takes 410 ns/particle/iteration, a factor of 5.6 slower. 

The complexity of timing on the Cray-1 can be understood by considering the 
"dummy" scalar multiply which issues at 219. This produces no useful result, but 
causes the next vector multiply instruction to issue one cycle later. Because of this, 
the vector add unit is ready at the chain slot time for this instruction (229) and 
the next instruction chains to this one. Without this dummy instruction, the vector 
multiply would issue at 219, the vector add would then miss the chain slot and have 
to wait until time 292 to issue; i.e., there would have been an additional delay of 63 
cycles. 

Conclusions 

In this paper, we have shown that the area-preserving nature of mappings imple- 
mented on a computer can be preserved by using fixed-point arithmetic. Because 
of this, much longer orbits can be studied. The precision is comparable to that 
of floating-point numbers; however, round-off error is much easier to control with 
fixed-point arithmetic. At present, access to fixed-point numbers is through assem- 
bly language. However, it is often faster than floating-point arithmetic. Indeed, 
since floating-point instructions are not available on several modern microprocessors, 
fixed-point arithmetic would be a natural way of studying mappings on such devices. 
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